Search CORE

16 research outputs found

Factorization of Discriminatively Trained i-vector Extractor for Speaker Recognition

Author: Burget Lukas
Glembek Ondrej
Novotny Ondrej
Plchot Oldrich
Publication venue
Publication date: 05/04/2019
Field of study

In this work, we continue in our research on i-vector extractor for speaker verification (SV) and we optimize its architecture for fast and effective discriminative training. We were motivated by computational and memory requirements caused by the large number of parameters of the original generative i-vector model. Our aim is to preserve the power of the original generative model, and at the same time focus the model towards extraction of speaker-related information. We show that it is possible to represent a standard generative i-vector extractor by a model with significantly less parameters and obtain similar performance on SV tasks. We can further refine this compact model by discriminative training and obtain i-vectors that lead to better performance on various SV benchmarks representing different acoustic domains.Comment: Submitted to Interspeech 2019, Graz, Austria. arXiv admin note: substantial text overlap with arXiv:1810.1318

arXiv.org e-Print Archive

Crossref

BAT System Description for NIST LRE 2015

Author: Brummer Niko
Burget Lukas
Cumani Sandro
Fer Radek
Glembek Ondrej
Grezl Frantisek
Karafiat Martin
Kesiraju Santosh
Li Ruizhi
Mallidi Sri Harish
Matejka Pavel
Novotny Ondrej
Ondel Lucas
Pesan Jan
Plchot Oldrich
Swart Albert
Vesely Karel
Publication venue: ISCA
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

BUT text-dependent speaker verification system for SdSV challenge

Author: Burget Lukáš
Glembek Ondrej
Lozano Díez Alicia
Matejka Pavel
Novotný Ondrej
Plchot Oldrich
Pulugundla Bhargav
Rohdin Johan
Silnova Anna
Veselý Karel
Publication venue: 'International Speech Communication Association'
Publication date: 29/10/2020
Field of study

In this paper, we present the winning BUT submission for the text-dependent task of the SdSV challenge 2020. Given the large amount of training data available in this challenge, we explore successful techniques from text-independent systems in the text-dependent scenario. In particular, we trained x-vector extractors on both in-domain and out-of-domain datasets and combine them with i-vectors trained on concatenated MFCCs and bottleneck features, which have proven effective for the text-dependent scenario. Moreover, we proposed the use of phrase-dependent PLDA backend for scoring and its combination with a simple phrase recognizer, which brings up to 63% relative improvement on our development set with respect to using standard PLDA. Finally, we combine our different i-vector and x-vector based systems using a simple linear logistic regression score level fusion, which provides 28% relative improvement on the evaluation set with respect to our best single systemThe work was supported by Czech Ministry of Interior projects Nos. VI20152020025 “DRAPAK” and VI20192022169 “AI v TiV”, Czech National Science Foundation (GACR) project “NEUREM3” No. 19-26934X, European Union’s Marie Sklodowska-Curie grant agreement No. 843627, European Union’s Horizon 2020 project no. 833635 - ROXANNE and by Czech Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project “IT4Innovations excellence in science” - LQ1602 and project no. LTAIN19087 “Multi-linguality in speech technologies

Crossref

Biblos-e Archivo

BAT System Description for NIST LRE 2015

Author: Brummer Niko
Burget Lukas
Cumani Sandro
Fer Radek
Glembek Ondrej
Matejka Pavel
Novotny Ondrej
Pesan Jan
Plchot Oldrich
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2016
Field of study

Crossref

PORTO Publications Open Repository TOrino

The Kaldi Speech Recognition Toolkit

Author: Boulianne Gilles
Burget Lukas
Ghoshal Arnab
Glembek Ondrej
Goel Nagendra
Hannemann Mirko
Motlicek Petr
Povey Daniel
Qian Yanmin
Schwarz Petr
Silovsky Jan
Stemmer Georg
Vesely Karel
Publication venue: Rue Marconi 19, Martigny, Idiap
Publication date: 19/12/2013
Field of study

We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users

Infoscience - École polytechnique fédérale de Lausanne

Approaches to automatic lexicon learning with limited training examples

Author: Agarwal Mohit
Akyazi Pinar
Burget Lukas
Feng Kai
Ghoshal Arnab
Glembek Ondrej
Goel Nagendra
Karafiat Martin
Povey Daniel
Rastrow Ariya
Rose Richard C.
Schwarz Petr
Thomas Samuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/11/2014
Field of study

Preparation of a lexicon for speech recognition systems can be a significant effort in languages where the written form is not exactly phonetic. On the other hand, in languages where the written form is quite phonetic, some common words are often mispronounced. In this paper, we use a combination of lexicon learning techniques to explore whether a lexicon can be learned when only a small lexicon is available for boot-strapping. We discover that for a phonetic language such as Spanish, it is possible to do that better than what is possible from generic rules or hand-crafted pronunciations. For a more complex language such as English, we find that it is still possible but with some loss of accuracy

Infoscience - École polytechnique fédérale de Lausanne

SUBSPACE GAUSSIAN MIXTURE MODELS FOR SPEECH RECOGNITION

Author: Agarwal Mohit
Akyazi Pinar
Burget Lukas
Feng Kai
Ghoshal Arnab
Glembek Ondrej
Goel Nagendra Kumar
Karafiat Martin
Povey Daniel
Rastrow Ariya
Rose Richard C.
Schwarz Petr
Thomas Samuel
Publication venue
Publication date: 01/01/2010
Field of study

We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subspace. This style of acoustic model allows for a much more compact representation and gives better results than a conventional modeling approach, particularly with smaller amounts of training data

Crossref

Hong Kong University of Science and Technology Institutional Repository